22 research outputs found
Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications
Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene-and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq
Recommended from our members
Statistical and computational methods for single-cell transcriptome sequencing and metagenomics
I propose statistical methods and software for the analysis of single-cell transcriptome sequencing (scRNA-seq) and metagenomics data. Specifically, I present a general and flexible zero-inflated negative binomial-based wanted variation extraction (ZINB-WaVE) method, which extracts low-dimensional signal from scRNA-seq read counts, accounting for zero inflation (dropouts), over-dispersion, and the discrete nature of the data. Additionally, I introduce an application of the ZINB-WaVE method that identifies excess zero counts and generates gene and cell-specific weights to unlock bulk RNA-seq differential expression pipelines for zero-inflated data, boosting performance for scRNA-seq analysis. Finally, I present a method to estimate bacterial abundances in human metagenomes using full-length 16S sequencing reads
Statistical and computational methods for single-cell transcriptome sequencing and metagenomics
I propose statistical methods and software for the analysis of single-cell transcriptome sequencing (scRNA-seq) and metagenomics data. Specifically, I present a general and flexible zero-inflated negative binomial-based wanted variation extraction (ZINB-WaVE) method, which extracts low-dimensional signal from scRNA-seq read counts, accounting for zero inflation (dropouts), over-dispersion, and the discrete nature of the data. Additionally, I introduce an application of the ZINB-WaVE method that identifies excess zero counts and generates gene and cell-specific weights to unlock bulk RNA-seq differential expression pipelines for zero-inflated data, boosting performance for scRNA-seq analysis. Finally, I present a method to estimate bacterial abundances in human metagenomes using full-length 16S sequencing reads
FiberGrowth Pipeline: A Framework Toward Predicting Fiber-Specific Growth From Human Gut Bacteroidetes Genomes
International audienceDietary fibers impact gut colonic health, through the production of short-chain fatty acids. A low-fiber diet has been linked to lower bacterial diversity, obesity, type 2 diabetes, and promotion of mucosal pathogens. Glycoside hydrolases (GHs) are important enzymes involved in the bacterial catabolism of fiber into short-chain fatty acids. However, the GH involved in glycan breakdown (adhesion, hydrolysis, and fermentation) are organized in polysaccharide utilization loci (PUL) with complex modularity. Our goal was to explore how the capacity of strains, from the Bacteroidetes phylum, to grow on fiber could be predicted from their genome sequences. We designed an in silico pipeline called FiberGrowth and independently validated it for seven different fibers, on 28 genomes from Bacteroidetes-type strains. To do so, we compared the existing GH annotation tools and built PUL models by using published growth and gene expression data. FiberGrowth’s prediction performance in terms of true positive rate (TPR) and false positive rate (FPR) strongly depended on available data and fiber: arabinoxylan (TPR: 0.89 and FPR: 0), inulin (0.95 and 0.33), heparin (0.8 and 0.22) laminarin (0.38 and 0.17), levan (0.3 and 0.06), mucus (0.13 and 0.38), and starch (0.73 and 0.41). Being able to better predict fiber breakdown by bacterial strains would help to understand their impact on human nutrition and health. Assuming further gene expression experiment along with discoveries on structural analysis, we hope computational tools like FiberGrowth will help researchers prioritize and design in vitro experiments
Un avenir renouvelable
Novel single-cell transcriptome sequencing assays allow researchers to measure gene expression levels at the resolution of single cells and offer the unprecendented opportunity to investigate at the molecular level fundamental biological questions, such as stem cell differentiation or the discovery and characterization of rare cell types. However, such assays raise challenging statistical and computational questions and require the development of novel methodology and software. Using stem cell differentiation in the mouse olfactory epithelium as a case study, this integrated workflow provides a step-by-step tutorial to the methodology and associated software for the following four main tasks: (1) dimensionality reduction accounting for zero inflation and over dispersion and adjusting for gene and cell-level covariates; (2) cell clustering using resampling-based sequential ensemble clustering; (3) inference of cell lineages and pseudotimes; and (4) differential expression analysis along lineages
Recommended from our members
Publisher Correction: A general and flexible method for signal extraction from single-cell RNA-seq data.
The original PDF version of this Article contained errors in two equations. In Eq. (1), all Γ symbols were inadvertently omitted. In the second equation in the subsection entitled '1. Dispersion optimization' within the Methods section 'ZINB-WaVE estimation procedure', all Ψ symbols were inadvertently omitted. These errors have been corrected in the PDF version of the Article; the HTML version was correct from the time of publication
Mating types and Reproductive success
Reproductive success of 239 crosses (478 matings). Columns represent the 16 mat A strains, and rows represent the 15 mat a strains, with isolates numbers along the row and column headings of the matrix. Numbers within matrix cells indicate the reproductive success ratings (see below) of the two reciprocal matings of the cross between the corresponding isolates (mat a isolate as the perithecial parent/mat A isolate as the perithecial parent). The matrix cells have been shaded in proportion to the reproductive success of the best mating (see Figure 5B in the corresponding publication). Additional row and column headings indicate the lineage designation (see Figure 1). Isolates P581 and TX8127 are tester strains of N. discreta sensu stricto. Categories of reproductive success, corresponding to different stages in reproductive development, from the development of perithecia, to the formation of black ascospores were scored as follows: 0 & 1 if sterile, no perithecia produced; & barren perithecia, no ostiole developed; 2 if perithecia developed ostioles, but no spores; 3 & 4 if 50% black ascospores